Overview:
-
Samples are subsets of an entire dataset. The whole dataset is called as population.
-
It is difficult and inefficient to conduct surveys or tests on the whole population. Hence sampling is employed to draw a subset with which tests or surveys will be conducted to derive inferences about the population.
-
During the sampling process, if all the members of the population have an equal probability of getting into the sample and if the samples are randomly selected, the process is called Uniform Random Sampling.
-
If some of the items are assigned more or less weights than their uniform probability of selection, the sampling process is called Weighted Random Sampling.
-
The pandas DataFrame class provides the method sample() that returns a random sample from the DataFrame.
Example 1 - Explicitly specify the sample size:
# Example Python program that creates a random sample # Age vs call duration # Random_state makes the random number generator to produce |
Output:
Random sample: |
Example 2 - Specify the sample size as a fraction of the population size:
# Example python program that samples # Uses FiveThirtyEight Comic Characters Dataset comicData = "/data/dc-wikia-data.csv"; # Sample size as 1% of the population |
Output:
(Rows, Columns) - Population: [69 rows x 13 columns] |
Example 3 - Random sampling using weights:
# Example Python program that creates a random sample # TimeToReach vs distance dataFrame = pds.DataFrame(data=time2reach); # Random_state makes the random number generator to produce print("Random sample using weights:"); |
Output:
Random sample using weights: Distance TimeToReach 4 30 40 9 55 70 6 40 50 7 45 60 8 50 65 |